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DATA  ANALYSIS  AND  MODELING  OF  ARCTIC  SEA  ICE 
SUBSURFACE  ROUGHNESS 

D.  P.  Gaver 

P.  A.  Jacobs 

Department  of  Operations  Research 

Naval  Postgraduate  School 

Monterey,  California  93940 

Executive  Summary 

Arctic  sea-ice  behavior,  particularly  roughness  and  ridging  (keeling) 
patterns  above  and  below  the  surface,  is  of  scientific  interest  to  oceanog- 
raphers  and  geologists.  It  is  also  of  potential  interest  to  those  conduct- 
ing military  operations  in  the  Arctic,  and  to  those  exploring  for  petroleum 
and  other  minerals.   In  particular,  military  operations  involving  submarines 
are  facilitated  by  access  to  the  surface  from  below,  so  indications  of  the 
distribution  of  "leads"  or  "polynyas",  or  of  relatively  thin  ice  regions,  are 
of  interest. 

This  paper  investigates  the  statistical  distribution  of  relatively  deep 
keels  occurring  beneath  the  ice.  Such  keels  may  provide  obstacles  to  under- 
ice  vehicles;  detached,  they  may  be  the  agents  by  which  gouging  of  the  bottom 
occurs.  Such  gouges  threaten  pipelines  or  cables. 

The  present  work  is  based  upon  data  furnished  by  Dr.  Peter  Wadhams  of 
Scott  Polar  Research  Institute,  Cambridge,  England.  It  was  originally 
obtained  by  upward-looking  sonar  aboard  the  submarine  U.S.S.  GURNARD  during 
April,  1976,  for  a  route  beneath  the  Beaufort  Sea  ice. 

The  methods  utilized  are  those  of  exploratory  data  analysis  and  of 
fitting  apparently  suitable  statistical  distributions  (probability  models) 
to  the  distances  between  successive  deep  (>  30  ft.)  keels,  and  to  the  depths 
of  the  keels  identified.  The  data  suggest  that  an  exponential -like,  but  not 


precisely  exponential,  model  may  well  represent  the  data:  the  simple  expo- 
nential "model"  favored  by  others  previously  tends  to  underestimate  the 
distances  between  keels,  and,  perhaps  more  importantly,  to  underestimate 
the  extreme  keel  depths.  This  is  of  interest,  for  deep  projections  are  most 
likely  to  impede  vehicle  progress,  and  to  cause  gouging. 

The  present  numerical  summaries  are  of  a  pilot  study.  The  methodology 
used  has  more  and  more  widely  applicable  elements:  the  "sculptured 
exponential"  distributions  utilized  here  may  well  be  of  service  for  summarizing 
other  data  involving  the  environment. 
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DATA  ANALYSIS  AND  MODELING  OF  ARCTIC  SEA  ICE  SUBSURFACE  ROUGHNESS 

D.  P.  Gaver 
P.  A.  Jacobs 

Department  of  Operations  Research 

Naval  Postgraduate  School 

Monterey,  California  93940 

1 .  Introduction 

The  spatial  pattern  of  the  sea  ice  cover  in  the  Arctic  has  been  of 
considerable  scientific  interest  to  geophysi cists  and  oceanographers  for 
some  time.  Its  presence  importantly  affects  the  environment  for  naval  and 
other  military  operations,  and  for  oil  and  mineral  exploration.  In  partic- 
ular, naval  submarine  operations  are  influenced  by  the  existence  of  deep 
downward  projections  ("ice  keels")  from  the  surface  canopy,  by  acoustic 
wave  reflections  from  the  underside  of  that  canopy,  and  by  the  apparently 
random  incidence  of  essentially  open  regions  in  the  ice  pack  ("leads"  or 
"polynyas")  that  permit  access  to  the  surface  from  below. 

In  this  paper  statistical  methods  are  used  to  characterize  and  summa- 
rize features  of  the  Arctic  ice  pack  related  to  those  mentioned  above.  The 
analysis  is  based  on  a  particular  set  of  data  furnished  by  Dr.  Peter  Wadhams 
of  Scott  Polar  Research  Institute,  Cambridge,  England,  to  whom  we  are 
grateful.  A  previous  analysis  of  these  data  has  been  reported  by  Wadhams 
and  Home  [1980],  hereafter  abbreviated  WH_.  While  the  approaches  of  earlier 
investigators  have  lead  to  simple  one-parameter  exponential  distributions 
as  summaries  of  data  describing  (a)  spatial  intervals  between  keel  occurrences, 
and  also  (b)  keel  depths,  our  definitions  and  data  analysis  suggest  that  both 
keel  spacings  and  keel  depths  are  longer-tailed  than  the  exponential.  We  further 
suggest  simple  parametric  forms  to  summarize  the  observed  statistical  behavior. 


2.  The  Data 

The  data  we  analyze  were  obtained  by  upward-looking  sonar  aboard  the 
submarine  U.S.S.  GURNARD  during  the  period  April  7-10,  1976,  from  beneath 
the  Beaufort  Sea  ice  canopy.  The  route  followed  by  the  GURNARD  was  from 
a  point  north  of  Barter  Island  (just  over  70°  N.)  to  75-76°  N,  thence  south- 
easterly to  a  point  72-73°  N,  and  finally  westerly  to  a  point  northeast  of 
Pt.  Barrow.  For  a  detailed  map  see  WH.  The  data  --  ice  drafts,  measured 
from' below  the  ice  to  the  surface  --  were  taken  over  a  1400  km.  transect 
length.  Data  tapes  were  initially  cleaned  and  processed  at  the  Arctic 
Submarine  Laboratory,  Naval  Undersea  Center,  San  Diego;  they  were  later 
further  processed  at  Scott  Polar  Research  Institute,  and  observations  which 
were  taken  at  intervals  of  1.3  -  1.5  m.  were  referred  by  interpolation  to 
a  nominal  1.0  m.  spacing.  Furthermore,  the  data  file  was  split  into  sections, 
each  of  which  make  up  about  50  km  of  data.  There  were  27  such  sections, 
with  a  gap  appearing  between  two  of  them.  More  detail  is  available  in  WH. 

Certainly  the  data  set  referred  to  can  provide  considerable  information 
concerning  the  underside  of  Arctic  ice.  However,  there  are   recognizable 
limitations  in  the  inferences  that  may  be  well-justified  from  even  a  sophis- 
ticated analysis  of  these  particular  measurements.  For  instance,  the  data 
were  obtained  during  a  relatively  short  period  of  time  in  one  year,   so  there 
is  no  opportunity  to  assess  month-to-month  or  season-to-season  variability. 
In  order  to  obtain  more  information,  more  data  must  be  subjected  to  analysis. 
Our  purpose  here  is  to  suggest  methods  of  analysis  that  may  be  useful  when 
such  data  become  available. 


3.  Data  on  Keel  Spacings  and  Keel  Depths:  Definitions 

The  raw  data  on  ice  drafts  were  transformed  into  data  on  keel  spacings 
and  magnitudes  by  the  simple  expedient  of  constructing  an  imaginary  line, 
L  ,  at  a  constant  depth  d(feet)  below  sea  surface,  and  then  measuring  dis- 
tances (spacings)  between  successive  up-  and  down-crossings  of  L  ,  denoted 
generically  by  x  ,  and  the  maximum  depth  (keel  depth,  relative  to  d) 
achieved  between  a  down-crossing  and  the  first  subsequent  up-crossings, 
denoted  by  y  .  Figure  1  should  clarify  this  definition,  which  differs 
somewhat  from  that  of  WH:  it  permits  the  occurrence  of  more  small  spacings 
than  does  theirs. 

Data  on  spacings  and  keels  were  initially  obtained  for  three  levels: 
d  =  30  (feet),  40,  50.  These  depths  are  apparently  of  interest  from  a  sub- 
marine operational  view  point,  but  are  too  deep  to  be  of  great  interest  to 
acousticians.  Further  analysis  of  crossings  at  smaller  depths  is  in  progress 
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4.  Exploratory  -  Descriptive  Analyses  of  Spacings 

The  initial  step  in  the  analysis  of  spacings  was  to  select  a  data  seg- 
ment running  from  latitude  71.140,  longitude  144.225  to  latitude  74.328, 
longitude  144.378  and  to  graphically  display  the  spacings  in  various  ways  to 
see  if  any  apparent  patterns  emerged.  We  discuss  here  spacings  at  a  depth 
of  30  ft. 

(a)  Serial  plot.  This  is  a  plot  of  x.  vs  i  ,  where  i  indicates  the 
order  in  which  the  spacing  occurred  along  the  track;  i  =  1  refers 
to  the  first  spacing,  i  =  2  to  the  second  encountered,  etc.  Such  a 
plot  appears  in  Figure  2  .  There  is  some  visual  evidence  that 
fewer  long  spacings  occur  among  the  first  200  or  so  (out  of  more 
than  600),  hence  that  the  series  may  be  somewhat  non-stationary. 

(b)  Serial  histograms.  The  data  were  segregated  or  binned  in  groups 
of  size  73  (convenient  fraction  of  total  number)  in  the  order  of 
their  occurrence,  and  each  group  was  histogrammed;  see  Figure  3  . 
This  presentation  reveals  the  exponential-like  positive  skewness 
(J-shapedness)  of  the  data,  and  also  suggests  that  long  spacings 
tend  to  occur  late  along  this  segment  of  the  track. 

(c)  Serial  boxplots.  The  groups  of  73  were  next  box-plotted;  see 
Tukey  [1977],  and  the  means  joined  by  an  eye-guiding  line.  Once 
again  the  picture  indicates  that  the  longest  spacings  tend  to 
occur  later  along  the  track.  There  is  a  slight  upward  trend  no- 
ticeable in  the  mean  line  that  is  probably  attributable  to  the 
influence  of  the  largest  spacings  in  each  group;  a  similar  plot 
connecting  medians  would  not  likely  show  much  trend;  Figure  4. 

(d)  Histogram  of  log  (spacings),  all  data.  If  data  are  highly  skewed, 
some  form  of  symmetrizing  transformation  is  often  useful;  see 
Tukey  [1977],  McNeil  [1977].  The  log  transformation  tends  to 


symmetrize  extreme  positive  skewness  (as  in  our  histograms  or  in 
an  exponentially  distributed  sample);  it  may  also  reveal  hidden 
patterns;  see  McNeil  [1977]  p.  11.  Figures  5  and  6  show  the  effect 
of  such  a  transformation  on  the  histogram  of  the  30  ft.  spacings 
data:  two  roughly  symmetrical  but  pronounced  concentrations  of 
values,  or  "bumps",  become  evident.  The  lower  bump  can  be  identi- 
fied with  very   closely-spaced  up  and  down  crossings  occurring  across 
the  bottom  of  a  large  ice  structure  (keel);  the  present  way  of 
identifying  keels  allows  these  "pseudo  keels",  which  are  usually 
defined  by  relatively  shallow  gouges,  to  appear,  while  the  approach 
of  WH.  suppresses  them.  The  upper  bump,  made  up  of  about  half  the 
data  at  d  =  30  ft.,  represents  the  genuine  keel  spacings  believed 
to  be  of  operational  significance.  The  lower  bump  describes  high- 
frequency  keel  occurrences  likely  to  be  of  interest  to  acousticians, 
but  this  subset  of  data  is  also  perturbed  by  measuring  instrument 
noise;  these  data  should  first  be  smoothed  to  minimize  the  noise 
contribution.  At  d  =  30  ft.  the  split  between  the  lower  and  upper 
bumps  identified  occurs  at  nearly  the  median  of  the  data,  or  at 
about  70  m. 
On  the  basis  of  these  last  observations  the  analysis  to  follow  will 
focus  on  attempts  to  summarize  parametrically  the  distribution  of  data 
associated  with  the  upper  bump  identified  in  the  log-plot.  These  data  appear 
to  be  comparable  to  the  spacing  data  discussed  by  WH.  and  others.  They 
appear  also  to  have  the  most  significance  to  those  concerned  with  subsurface 
vehicular  (submarine)  operations. 


More  precisely,  the  first  200  observations  of  the  series  are  dropped. 
The  median  of  the  remaining  data  is  calculated  and  those  data  points  less 
than  the  median  are  dropped.  Finally,  the  median  is  subtracted  from  the 
remaining  data  points.  Results  of  the  analysis  of  this  upper  half  of  the 
spacings  data  will  follow. 
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5.  Towards  a  Parametric  Description  of  Spacings 

A  simple  exponential  distribution  summarization  of  spacings  data  has 
been  suggested  by  Hibler  [1972],  and  adopted  by  WH_.  also  for  purposes  of 
discussion. 

Model  1:  X  ~  Exponential .  Let  X  be  a  random  variable  representing 
a  typical  spacing.  Then  if  x  is  any  positive  number,  the  simple 
exponential  model  is  that 

P{X  >  x}  =  e"yX  ,  (5.1) 

so  the  probability  density  of  spacings  is 


fx(x;y)  =  e"yXy       x  >  0  (5.2) 


and  u     is  the  rate  of  occurrence  of  spacings  per  unit  distance;  equivalently, 

E[X]  =  1  .  (5.3) 


This  says  that  the  population  average  is  1/y  .  It  is  well  known  that  the 
maximum  likelihood  estimate  of  y  in  model  (5.1)  is  simply  y  =  (x)~  , 
the  inverse  of  the  average  of  observations  on  X  ,  supposing  that  successive 
spacings  are  identically  distributed  and  independent.  A  time  series  (lagged 
correlation  and  spectrum)  analysis  of  successive  spacings  at  d  =  30  ft. 
gives  evidence  of  only  wery   weak  dependence  between  spacings;  such  dependence 
will  be  ignored  in  what  follows. 

An  informal  but  informative  check  for  the  suitability  of  the  exponen- 
tial model  is  to  examine  a  plot  of  the  order  statistics  X/.\  of  the  (upper 
half  of  the)  data  to  the  corresponding  expected  exponential  order  statistics: 
a  straight  line  relationship  signifies  that  the  simple  exponential  fits 
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well.  Examination  of  Figure  8  shows,  and  Figure  9  reinforces,  the  appearance 
of  a  systematic  upward  bow  in  the  data,  signifying  a  systematically  longer- 
than-exponential  right  tail.  Such  an  effect  is  also  present  for  d  =  40  ft., 

and  d  =  50  ft.,  but  the  curvature  becomes  progressively  less  noticeable 
as  the  depth,  d  ,  increases.  It  has  been  noted  also  that  as  depth  decreases 
(eg.  d  =  10  ft.),  there  is  a  tendency  for  bowing  to  occur  in  the  opposite 
direction,  i.e.  for  spacings  to  become  shorter-tailed  than  exponential. 

A  numerical  summary  of  observed  data  characteristics,  in  addition  to 
the  message  of  Figures  8  and  9,  thus  suggests  the  need  for  a  representation 
other  than  the  exponential;  alternatives  are  considered  in  the  following 
sections. 

Table  1. 

Moment  and  Quantile  Summaries  of 
Spacings  in  Excess  of  Median  (70  meters)  at  Depth  d  =  30  ft. 

Mean:  x  =  0.929  (km.) 

s2   =  var[X]  =   1.485,   s  =   1.218 

Coeff.  of  variation  =  *  =  1.311  (1.0)* 

x 
Skewness  =  y-,   =  2.678  (2)* 

A 

Kurtosis  =  y2  =  9.853  (6)* 
Lower  Quartile  =  Q  =  0.155 
Median  =  0.487 
Upper  Quartile  =  Q  =  1.279 

*Numbers  in  parentheses  are   characteristic  of  an  Exponential  model. 
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The  occurrence  of  a  near-exponential  distribution  of  spacings  between 
deep  keels  is  perhaps  not  surprising,  in  that  ice  structure  formation  appears 
to  have  a  random  nature  without  much  long-term  order.   It  is  the  near- 
independence  of  ice  structure  sizes  in  neighboring  parts  of  the  pack  that 
is  probably  responsible  for  the  near-exponential ity  of  the  spacings  observed. 
It  is  interesting  that  the  spacing  between  two  consecutive  fixed  high  level 
upcrossings  in  certain  Gaussian  processes  can  be  shown  to  be  approximately 
exponential;  see  Cramer  and  Leadbetter  [1967],  Chap.  12.  Of  course  ice 
depths  are  by  no  means  Gaussian,  but  the  conditions  of  weak  long-run 
dependence  relied  upon  to  produce  the  Gaussian  result  are  approximately 
present  for  the  ice  data  as  well.  See  also  results  of  Gaver  and  Jacobs 
[1981]  regarding  the  probability  of  reaching  a  high  level  in  a  non-Gaussian 
process.  Agreement  with  the  exponential  distribution,  both  for  spacings 
and  keel  depths,  seems  to  improve  with  increase  in  reference  depth,  d,  as 
is  to  be  expected;  numerical  and  graphical  results  are  not,  however,  given 
in  this  report. 


6.  Modification  of  the  Exponential  Model:  The  Sculptured  Exponential 

We  propose  to  fit  the  upper  half  of  the  d  =  30  ft.  spacings  data,  i.e. 

the  magnitude  of  the  spacing  that  exceeds  the  median  spacing  (about  70  m.) 

by  a  modified  or  scul ptured  exponential. 

Model  2:  Linearly  Sculptured  Exponential.  As  an  alternative  to  the 

exponential  (5.1)  set 

X  =  AZ(1  +  CZ)  ,  (6.1) 

where  X  represents  a  spacing,  Z  is  a  unit  exponential  basic  r.v.  and  A 
and  C  are  constants,  A  being  a  scale  and  C  reflecting  departure  from 
exponential ity.  The  term  (1  +  CZ)  "sculptures"  Z  by  leaving  small  values 
of  Z  virtually  unchanged  (1  +  CZ  :  1  for  Z  small),  but  expanding  large 
values  (1  +  CZ  ~  CZ  for  Z  large).  Since  X  is  represented  as  a  monotonic 
increasing  function  of  Z  ,  we  can  represent  the  order  statistics,  x/-\>  as 
fol lows : 


X(J)  •  AZ(j)(l  +  CZ(j))  ,  (6.2) 


that  is  the  size-ordered  X-values,  X/-.X  <  X,^\   <  . . .  <  X/  x  are  easily 

represented  in  terms  of  those  for  Z  ,  Z/.x  .  Furthermore,  if  Z,  M  is 
F  (j)  (j) 

the  jth_  order  statistic  of  a  unit  exponential,  then 

e1  e?  e. 

Z/.x  =  —  + -^r  +  ...  +  — 4r  ,  (6.3) 

(j)    n   n-1        n-j+1 

so  the  basic  exponential  Z/.x  is  represented  as  a  weighted  sum  of  exponential 
gaps;  here  {e.}  is  a  sequence  of  iid  exponential  random  variables.  Even 
imbedding  representation  (6.3)  into  (6.2)  or  (6.1)  is  not  difficult;  exact 
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expectations  of  X,  Vs  are  easily  found.  Note  that,  similarly,  the 
quantiles  or  percent  points,  x(a)  ,  of  X  can  be  written  in  terms  of  those 
of  Z,  z(a): 

x(a)  =  Az(a)(l  +  Cz(a))  ,      (0  <  a  <  1)  .  (6.4) 

Thus  sculpturing  gives  a  simple  representation  of  the  inverse  distribution 
of  X  in  terms  of  that  of  Z  . 

The  above  considerations  suggest  that  sculpturing  is  sometimes  a  natural 
way  of  fitting  a  Wilk-Gnanadesikan  [1968]  q-q  plot.  For  further  discussion 
see  Gaver  and  Acar  [1979],  and  Gaver  [1982]. 

Expressions  for  the  moments  and  (Pearsonian)  skewness  and  kurtosis  of 
(6.1)  are   obtainable  from  the  following  formulas: 

E[Xk]  =  Ak  I    (k)(j+k)!CJ  ;      k  =  1,2,3,...  (6.5) 

j=0  J 


in  particular 


E[X]  =  A(1+2C)  ,  E[X2]  =  A2(2+2x3!C  +  4!C2)  (6.6) 


E[X3]  =  A3[3!  +  3x4!C  +  3x5!C2  +  6!C3  , 

E[X4]  =  A4[4!  +  4x5!C  +  6x6!C2  +  4x7!C3  +  8!C4]  .  (6.7) 


From  these  expressions  there  follow  formulas  for  central  moments  obtained 
by  substitution  into  the  usual  general  formulas: 
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Var[X]  =  E[X2]  -  (E[X])2 


(6.8) 


Skew  X  =  E[X3]  -  3E[X2]E[X]  +  2(E[X])3 

(Var[X])3/2 


Kurt[X]  =  E^4]  +  6E[X2](E[X])2  -  4EfX3]E[X]  -  3(E[X])4 


(Var[X])' 


(6.9) 


(6.10) 


Furthermore,  explicit  formulas  for  the  distribution  and  density  functions  of 
Model  2,  (6.1),  can  be  derived. 


FY(x;  A,C)  =  P{X  <  x}  =  1  -  exp 


2x 


+  J£~+ 


A  +  /A"  +  4ACx 


is  the  distribution  function,  and 


fY(x;  A,C)  =  exp 


2.x 


a  +  Ar  + 


4ACx 


/A2  + 


4ACx 


is  the  density, 


(6.11) 


(6.12) 


The  distributional  form  resembles  that  of  an  ordinary  exponential  with  scale 
A  and  shape  parameter  1  for  small  x  ,  gradually  transitioning  into  a 
longer-tailed  Weibull  with  shape  parameter  1/2  as  x  increases.  A  modifi- 
cation of  the  cumulative  hazard  in  (6.11)  suggests  itself:  instead  of 


2x/(A  +  A     +  4ACx)  consider  more  generally  Ax/{a.  +  (l-a)(l+yx  )q};  this 
leads  to  the  distributional  form 


Fx(x;A,a,Y,p,q)  =  1  -  exp 


Ax/{a  +  (l-a)(l+YXP)q} 
p,q,A,7  >_  0  ,    0  <_  a  <_  1  . 


(6,13) 
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Note  that  (6.13)  allows  the  representation  of  both  ultimately  long-tailed 
and  ultimately  short-tailed  distributions  relative  to  the  exponential,  but 
maintains  exponential -1  ike  behavior  for  small  x  .  Unfortunately,  simple 
structural  behavior  of  random  variables  akin  to  (6.1)  disappears.  The 
properties  of  distributions  such  as  (6.13)  will  be  the  subject  of  future 
exploration;  they  are  not  considered  further  in  this  report. 

Model  3:  Exponentially  Sculptured  Exponential.  A  further  alternative 
to  the  simple  exponential  (5.1)  is  of  the  form 


X  =  AZeCZ,     A,  C  >  0  .  (6.14) 


CZ 
The  sculpturing  term  e  '   again  leaves  small  values  of  Z  (exponential) 

nearly  unchanged,  but  considerably  extends  large  Z  values.  Once  again  by 

monotonicity  we  have 

CZ,  .x 
X(j)  =  AZ(J)6         (J  =  1>2"-"  n)  (6J5) 

and 

x(a)  =  Az(a)eCz(a),      (0  <  a  <  1)  (6.16) 


for  order  statistics  and  quantiles  of  X  from  (6.14). 

Expressions  for  the  moments  of  this  model  come  by  differentiating  the 
moment-generating  function  of  Z  , 


E[eeZ]  =  (1-e)-1,      (0  <  G  <  1); 
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(C  <   1) 

(6.17) 

(C  <   1/2) 

(6.18) 

(C  <   1/3) 

(6.19) 

(C  <   1/4) 

(6.20) 

we  obtain 

E[X]  =  A(l-C)"2 

E[X2]  =  A2(1-2C)~3 

E[X3]  =  A3(1-3C)~4 

E[X4]  =  A4(1-4C)"5 


The  central  moments,  i.e.  variance,  skewness,  and  kurtosis,  then  come  with 
the  aid  of  (6.8),  (6.9),  and  (6.10). 

It  is  not  possible  to  derive  a  simple  expression  for  the  distribution 
function  and  density  of  the  model  (6.14). 
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7.  Fitting  the  Sculptured  Exponential  Model  for  Spacings 

A  number  of  ways  of  fitting  models  or  representations  such  as  (6.1) 
and  (6.14)  suggest  themselves;  among  these  are  the  following. 

(A)  Maximum  likelihood:  possible  for  (6.1),  using  (6.12). 

(B)  Moment-matching:  possible  for  (6.1),  and  also  (6.14). 

(C)  Quantile  matching:  feasible  for  any  sculptured  representation. 

(D)  Hybrid  methods:  e.g.  constrained  likelihood  fit  by  requiring 
E[X]  =  x  and  allowing  C  in  (6.1)  to  be  determined  by  maximum 
likelihood. 

(E)  Generalized  non-linear  least-squares,  robustified  if  necessary: 
it  is  proposed  to  regress  x,.x  on  Az(j)[l  +  Cz(j)]  where  e.g. 
z(j)  =  -J£n(l-j/(n+l)),  the  approximate  expected  value  of  the 
basic  r.v.  Z  ,  and  an  appropriate  covariance  matrix  is  utilized, 

We  report  the  results  of  applying  several  of  these  methods  to  fit 
Models  2  and  3  to  spacings  (at  d  =  30  ft.)  data;  the  results  are  then 
diagnostically  examined. 
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(A)  Maximum  Likelihood. 


The  log  likelihood  function  associated  with  (6.1)  and  (6.12)  is 


L(A,C;  x) 


n 

I 


2x 


J. 


A  +  A2  +   4AC  x. 
J 


-  i  £n(A2  +  4AC  x.) 


(7.1) 


Differentiation  leads  to  likelihood  equations  for  A  and  C  ;  these  must 
be  solved  numerically,  e.g.  by  a  Newton-Raphson  method.  Alternatively,  a 
search  of  the  likelihood  function  itself  is  reasonably  effective.  Results 
are   summarized  below  for  the  30  ft.  depth  data  set. 

Table  2 
Maximum  Likelihood  Fits  of  Spacings  by  a 

Sculptured  Exponential,  (6.1). 
n  =  231  ,  A  =  0.53(0.09)*,  C  =  0.39(0.16)* 
(Corr(A,C)  =  -  0.84)* 


Estimates 

Raw  Data 

Model 

E[X] 

0.93 

0.95 

Coeff.  Var.[X] 

1.31 

1.50 

Skew[X] 

2.68 

4.57 

Kurt[X] 

9.85 

40.32 

Lower  Quartile, 

Q 

0.155 

0.170 

Median 

0.487 

0.467 

Upper  Quartile, 

Q 

1.28 

1.13 

(  )  represent  large-sample  standard  errors  calculated  from  the  likelihood 
(Fisher  information). 
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Although  the  agreement  of  the  lower  moments  is  satisfactory,  that  of 
the  higher  moments  (skewness  and  kurtosis)  is  much  less  so;  this  points  to 
the  apparent  sensitivity  of  the  m.l.e.  to  extreme  values,  assigning  an 
unreasonably  high  C  ("correction")  value.  Nevertheless,  a  Kolmogorov- 
Smirnov  test  of  goodness  of  fit  yields  a  value  of  0.65  for  the  fitted 
sculptured  model,  while  a  Kolmogorov-Smirnov  value  of  1.73  is  found  for  a 
simple  exponential  fit;  the  sculptured  model  is  seen  by  this  test  as  pro- 
viding a  substantially  improved  fit.  Furthermore,  a  diagnostic  plot  of 
observed  (x.)  y_s.  predicted  (x.(A,C)  order  statistics  provides  a  more 
satisfactory  straight-line  fit  than  does  an  exponential  model.  See  Figure  10, 
A  further  plot  of  residuals  is  given  below  in  Figure  11.  The  sculptured 
model  fitted  by  maximum  likelihood  to  the  particular  set  of  data  under  dis- 
cussion appears  to  predict  a  somewhat  longer  far  right  tail  than  is  evident 

from  the  data. 

An  alternative  diagnostic  plot  is  available  for  the  present  and  certain 

other  sculptured  models:  for  (6.1)  the  transformation  obtained  by  solving 
x  =  Az(l  +  Cz)  for  z  ,  using  the  estimated  parameter  values  provides 
estimates  of  the  basic  z-values,  denoted  by  z  ,  giving  rise  to  the  obser- 
vations. To  the  extent  that  the  latter  resemble  observations  on  a  unit 
exponential,  the  model  is  verified.  Figures  12  and  13  provide  such  a 
diagnostic  plot. 
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(B)  Moment  Matching:  Linear  Sculpturing 

An  alternative  to  the  maximum  likelihood  method  is  that  of  matching 
moments.   It  is  customary  to  equate  the  two  lowest  moments,  e.g.  sample  and 
model  mean  and  variance,  when  fitting  two-parameter  models.  There  is 
theoretical  justification  in  the  present  case  for  matching  sample  and  model 
means  and  coefficients  of  variations:  the  goodness-of-exponential-f it  test 
of  Stephens  [1978],  adapted  from  ideas  of  Shapiro  and  Wilk  [1972],  is  essen- 
tially based  on  the  sample  of  coefficient  of  variation;  (cf.  Shapiro  and 
Wilk  [1972],  p.  357  footnote).  The  coefficient  of  variation  is  a  simple 
rational  function  for  the  model  (6.1),  and  C  can  be  found  explicitly. 

The  skewness  of  model  (6.1)  is  also  a  simple  monotonic  expression  in 
C;  this  can  be  equated  to  the  sample  skewness  and  solved  for  an  estimate 
of  C  ,  C  .  An  estimate  of  A  then  is  obtainable  from  the  first  moments. 

When  applied  to  spacings  at  d  =  30  ft.  the  skewness-matching  method  pro- 
duces estimates  that  differ  noticeably  from  those  given  by  maximum 
likelihood:  A  is  larger,  and  C  is  smaller  than  the  corresponding  maxi- 
mum likelihood  estimates.  In  several  respects  the  skewness-match  is  to  be 
preferred:  it  agrees  best  with  the  data  evidence  in  the  far  tail,  i.e.  at 
the  upper  quartile  and  with  respect  to  the  kurtosis  measures  of  model  and 
data. 
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Table  3 
Moment  -  Matched  Fits  of  Spacings 
by  A  Sculptured  Exponential 


Estimates 

Raw  Data 

E[X] 

0.929 

Coeff.  Var.EX] 

1.31 

Skew[X] 

2.68 

Kurt[X] 

9.85 

Lower  Quartile, 

Q 

0.155 

Median 

0.487 

Upper  Quartile, 

Q 

1.29 

Model 

Model 

A  = 

C.V.  Match 
0.67,  r  =  0 

0.929 
1.31 
3.71 
26.01 
0.203 
0.53 
1.18 

1 
193j 

(*. 

Skew.  Match 
0.827,  r  =  0.062 

0.929 
1.12 
2.68 
12.54 
0.242 
0.598 
1.24 

Figures  14  and  15  provide  diagnostic  plots  of  quality  of  fit  for  the 
skewness-matched  sculptured  model ,  (6.1). 
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(C)  Moment  Matching:  Exponential  Sculpturing 

The  model  (6.15)  was  also  fit  to  the  data  by  choosing  C  to  match  the 
coefficient  of  variation  and  choosing  A  to  match  the  mean  of  the  data. 
The  values  of  the  estimated  parameters  and  predicted  moments  are  as  follows 

Table  4. 

Coefficient  of  Variation  -  Match  fit  of  Spacings  by  the 
Sculptured  Model  (6.14) 


A  = 

0.721 

C  = 

=  0.119 

Estimates 

Raw  Data 

Model 

E[X] 

0.929 

0.929 

CV.[X] 

1.31 

1.31 

Skew[X] 

2.68 

4.53 

Kurt.[X] 

9.84 

53.64 

Lower  Quartile, 

Q 

0.155 

0.215 

Median 

0.487 

0.543 

Upper  Quartile, 

Q 

1.29 

1.18 

Figures  16  and  17  provide  diagnostic  plots  of  quality  of  fit  for  the 
model  (6.14).  The  residuals  appear  to  have  the  same  shape  as  the  residuals 
for  the  m.l.e.  and  skewness-match  fit  of  model  (6.1).  The  residuals  in 
Figure  17  are  more  symmetrically  and  closely  grouped  about  the  axis  than 
are  the  residuals  for  the  maximum  likelihood  fitted  linear  model. 
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8.  The  Gamma  Model  for  Spacings 

The  gamma  distribution  is  an  alternative,  and  classical,  model  for 
spacings  that  includes  the  exponential  as  a  special  case.   Its  density  is 

fx(x;  B,A)  =  e-x/B -^f  A"'l  •  (8.1) 

Neither  the  distribution  function  nor  the  quantiles  or  order  statistics  of 
general  gamma-distributed  random  variables  are    explicitly  expressible  in 
simple  closed  form. 

The  gamma  has  been  fitted  to  the  30  ft.  spacings  data  by  maximum 
likelihood  and  also  by  matching  the  first  two  moments.  The  results  are 
summarized  below. 

Table  5. 

Estimates       Raw  Data      Model (Moments)         Model (M.L.E. ) 

(A  =  0.582,  C  =  1.60)      (A  =  0.70,  B=1.33) 

S.E.   (0.15)      (0.60) 

Corr(A,B#)   =  -    .71 

0.929  0.929 

1.311  1.196 

2.622  2.390 

10.32  8.58 

0.127  0.172 

0.478  0.541 

1.256  1.276 


E[X] 

0.93 

Coeff.  Var[X] 

1.31 

Skew[X] 

2.68 

Kurt[X] 

9.85 

Lower  Quartile, 

Q 

0.155 

Median 

0.487 

Upper  Quartile, 

Q 

1.279 
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In  Figures  18  and  1 9  diagnostic  plots  of  the  gamma  fit  by  maximum 
likelihood  are  presented.  There  is  a  remarkable  resemblance  between  the 
appearance  of  the  diagnostics  for  the  m.l.e.-  fitted  gamma  model  and  the 
skewness-fitted  sculptured  exponential  for  this  data  set.  Note  that  both 
of  these  model  representations  now  tend  to  predict  smal ler  extreme  right 
tails  than  indicated  by  the  raw  data,  in  contrast  to  the  m. 1 . e. -fitted 
sculptured  exponential. 
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9.  Statistical  Properties  of  Ice  Keels 

Turn  now  to  a  discussion  of  the  distributional  properties  of  keels, 
i.e.  the  locally  maximal  projections  of  individual  ice  structures  below  the 
sea  surface;  see  Figure  1.  The  present  discussion  is  confined  to  those 
keels  referenced  from  a  30  ft.  depth  (keel  depths  are  in  units  of  feet). 

Figure  20  is  a  plot  of  keel  depths  (excess  over  30  ft.)  in  sequential 
order,  Figure  21  is  a  histogram  of  raw  keel  depths,  and  Figure  22  is  a 
histogram  of  log  keel  depths.  The  latter  shows  little  evidence  of  the  pro- 
nounced "two-bump  effect"  of  the  spacings.  Earlier  work,  WH,  and  Hibler 
[1972],  among  others,  has  represented  keel  depths  by  the  simple  exponential 
model,  but  again  the  data  give  evidence  of  a  systematically  longer-than- 
exponential  right  tail;  see  Figures  23  and  24. 

Two  theoretical  models  were  fitted  to  the  raw  data:  the  sculptured 
exponential,  (6.1),  and  the  gamma.  Several  methods  of  fitting  were  employed 
lower  moment-matching  and  maximum  likelihood.  The  adequancy  of  the  fits  was 
assessed  by  numerical  and  graphical  methods,  and  the  following  tables  and 
figures  25-30  summarize  the  results. 
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The  exponentially  sculptured  model  (6.14)  was  also  fitted  to  the  keel 
data  by  choosing  C  to  match  the  coefficient  of  variation  of  the  data  and 
choosing  A  to  match  the  mean.  The  values  of  the  estimated  parameters  and 
the  predicted  moments  are   as  follows. 

Table  7 

Coefficient  of  Variation  -  Match  fit  of  Keel  Depths  by  the 
Exponentially  Sculptured  Model 


Estimates 

Raw  Data 

Model 

E[X] 

6.90 

6.90 

C.V.[X] 

1.26 

1.26 

Skew[X] 

2.22 

4.01 

Kurt.[X] 

5.55 

37.86 

Lower  Quartile, 

Q 

1.20 

1.64 

Median 

3.60 

4.13 

Upper  Quartile, 

Q 

8.60 

8.87 

Diagnostic  Plots  appear  as  Figures  31  and  32. 

As  was  found  to  be  true  for  spacings,  the  maximum  likelihood  fitted 
sculptured  model  (6.1)  applied  to  the  present  keel  depth  (reference 
d  =  30  ft.)  data  systematically  overestimates  the  magnitude  of  the  far  right 
tail  (the  number  of  deep  keels).   In  this  case  the  gamma  model  underestimates 
the  final  right  tail  values,  see  Figures  29  and  especially  30.  The  skewness- 
matched  sculptured  exponential  also  tends  to  underestimate  the  far  right 
tail  of  the  data.  An  examination  of  the  residual  plot  of  Figure  26  suggests 
that  the  m.l.e.-  fitted  sculptured  model  nicely  fits  all  keel  size  data 
except  the  very   largest.  The  coefficient  of  variation  match  fit  of  expo- 
nentially sculptured  model  (6.14)  produces  residuals  that  are  less  structured 
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than  either  the  gamma  model  or  sculptured  model  (6.1),  see  Figure  33.  The 
model  (6.14)  was  also  fit  by  matching  the  skewness,  this  fit  produced 
residuals  similar  in  appearance  to  Figure  28. 
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APPENDIX 


Estimates  for  the  Percent-Points  of  Maximum  Keel  Depths,  and 
Corresponding  Uncertainty  Estimates  (Confidence  Limits). 

In  Section  9  the  exponentially  sculptured  model 


X  =  AZe 


CZ 


(A-l) 


was  fitted  to  a  batch  of  365  keel  depths;  this  is  Model  3,  (6.14).  Further- 
more, diagnostic  plots  of  order  statistics  residuals  indicated  a  reasonably 
successful  fit  of  the  data  by  the  model.  We  now  wish  to  utilize  the  model 
to  predict  statistical  aspects  of  the  maximum  keel  depth  to  be  encountered 
in  a  further  series  of  keel  observations.  Specifically  we  illustrate  the 
procedures  and  results  by  assuming  that 

(a)  a  future  sequence  of  365  keel  depths  is  of  interest,  and  that 
these  data  come  independently  and  randomly  from  model  (A-l),  or 
(6.14); 

(b)  we  are  interested  in  predicting  the  95 —  percent  point  of  the 
maximum  of  the  data  values  in  (a); 

(c)  we  are  also  interested  in  associating  95%  confidence  limits  with 
the  point  estimate  of  (b). 


The  above  numerical  values  are  illustrative  only;  it  will  be  equally 
possible  to  predict  the  median  or  mean  of  the  future  maximum,  together  with 
confidence  limits. 

The  form  of  the  model  (A-l)  is  especially  convenient  for  addressing 
(b)  and  (c).  Clearly  the  95 —  percent  point  for  the  maximum  of  a  (future) 
sample  of  365  unit  exponential  random  variables  is 

1/(365; 


Z(365)(°'95)  =  " 


ln{l  -  (0.95)1/(365)}_ 


=  8.87.. 


(A-2) 
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such  numbers  can  be  obtained  accurately  and  handily  from  any  set  of  extensive 
tables,  or  even  from  a  hand-held  calculator.  Now  by  monotonici ty,  see  (6.16), 

t  h 

the  95 —  percent  point  of  the  maximum  of  a  sample  of  365  future  keels  is  given 
by 


x(365)(0.95)  =A 


Z(365)(°-95) 


eC[z(365)(0.95)] 


(A-3) 


provided  the  model  is  correct  and  A  and  C  are  known.  If  the  model  is 
correct  but  A  and  C  are  estimates  of  A  and  C  ,  then  an  estimate  of 
the  percent  point  is 


x(365)(0.95)  =  A 


z(365)(0.95) 


;C[z(365)(0.95)]  < 


(A-4) 


Thus  a  point  estimate  of  the  percent  point  of  the  maximum  can  be  generated 
by  simply  substituting  the  parameter  point  estimates  into  (A-3),  a  wery 
simple  and  direct  task. 

Next  address  (c),  the  uncertainty  in  the  above  estimate,  or,  more 
specifically,  approximate  confidence  limits  for  the  unknown  percent  point. 
We  compute  two  estimates:  the  jackknife  confidence  limits,  and  the  bootstrap 
confidence  limits;  see  Efron  [1980]  for  a  leisurely  discussion  of  both 
methods. 

The  jackknife  procedure  involves  deletion  of  one  observation  at  a  time 
from  the  batch  of  data,  and  the  re-computation  of  A  and  C  using  the 
remainder  of  the  data;  the  estimates  obtained  omitting  data  point  i  are 
called  A/-v  and  C/ . \  .  Next  one  computes  the  quantities 
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L(i)  =  zn 


h 


z(365)(0-95) 


eC(i)[z(365)<°-95>^  ; 


(A-5) 


the  logarithm  is  taken  in  order  to  render  jackknifing  more  valid  by  sym- 
metrizing  the  sampling  distribution  of  X/365\(0.95)  .  Finally  one  computes 
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365 


L  =  365  L(0)  "  365  J1  L(i) 


(A-6) 


where  L,qn  =  in  x, 3cc\(0.95) ,  the  logged  percent  point  estimate  with  no 
observations  removed.  The  jackknifed  estimate  of  the  variance  of  L  is 
here 


/\         /\ 


365 


VWL]=^T(L(i)-L)2 


(A-7) 


in  the  present  instance  the  numerical  value  is  0.0156.  The  approximate  95% 
confidence  interval  for  Jin  x(35c\(0-95)  is  then 


L  +  1.96  /v/\R[L]  =  [4.585,  5.075]  ; 


(A-8) 


the  95%  confidence  interval  for  the  actual  percent  point  of  the  maximum  is 
then  obtained  by  exponentiating  the  limits  of  (A-8),  giving  the  interval 
[98.0,  160.0]. 

The  bootstrap  procedure  involves  re-sampling:  from  the  empirical 
distribution  of  the  original  data  points,  obtain  200  independent  random 
"bootstrap"  samples  of  size  365  each,  and  from  each  bootstrap  sample  compute 
estimates  of  parameters  A  and  C  ,  and  thence  of  x,3fit-\  (0.95) ;  there  will 
thus  be  200  bootstrap  estimates  in  all.  Now  simply  order  the  latter  and 
find  the  5 smallest  and  largest  (195 smallest)  of  these  bootstrap 
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estimates,  which  will  be  quoted  as  the  95%  confidence  interval;  and  is 
[91.8,  149.9].  Notice  that  the  log  transformation  is  not  required  when  the 
bootstrap  method  is  utilized.  It  is  gratifying  that  the  bootstrap  and  jack- 
knife  methods  are  in  such  close  numerical  agreement  for  the  present^data. 
Both  methods  are  somewhat  more  computationally  demanding  than,  say,  the 
classical  "delta  method"  would  be,  but  are  entirely  feasible  using  modern 
computers. 
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